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ABSTRACT 

The empirical HOD model of Wang et al. 2006 fits, by construction, both the stellar 
mass function and correlation function of galaxies in the local Universe. In contrast, 
the semi-analytical models of De Lucia & Blazoit 2007 (DLB07) and Guo et al. 2011 
(Guoll), built on the same dark matter halo merger trees than the empirical model, 
still have difficulties in reproducing these observational data simultaneously. We com- 
pare the relations between the stellar mass of galaxies and their host halo mass in the 
1 three models, and find that they are different. When the relations are rescaled to have 

the same median values and the same scatter as in Wang et al., the rescaled DLB07 
model can fit both the measured galaxy stellar mass function and the correlation func- 
tion measured in different galaxy stellar mass bins. In contrast, the rescaled Guoll 
model still over-predicts the clustering of low-mass galaxies. This indicates that the 
detail of how galaxies populate the scatter in the stellar mass - halo mass relation 
does play an important role in determining the correlation functions of galaxies. While 
the stellar mass of galaxies in the Wang et al. model depends only on halo mass and is 
randomly distributed within the scatter, galaxy stellar mass depends also on the halo 
formation time in semi-analytical models. At fixed value of infall mass, galaxies that 
lie above the median stellar mass - halo mass relation reside in haloes that formed 
earlier, while galaxies that lie below the median relation reside in haloes that formed 
later. This effect is much stronger in Guoll than in DLB07, which explains the over- 
clustering of low mass galaxies in Guoll. Our results illustrate that the assumption 
of random scatter in the relation between stellar and halo mass as employed by cur- 
rent HOD and abundance matching models may be problematic in case a significant 
assembly bias exists in the real Universe. 

Key words: galaxies: haloes - galaxies: formation - cosmology: large-scale structure 
of Universe 
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1 INTRODUCTION 

In the currently favoured scenario for structure formation, 
galaxies are believed to form by gas condensation within 
the potential wells of dark matter haloes that form and 
evolve in a hierarchical bottom-up fashion: small haloes form 
first and later merge to form more massive systems. Dif- 
ferent methods have been developed to link the physical 
properties of galaxies (such as their stellar mass and/or 
luminosity) to the properties of their host haloes. These 
methods include the traditional Halo Occupation Distribu- 
tion (HOD) models whose ingredients are: (i) the proba- 
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bility distribution relating the mass of a dark matter halo 
to the number of galaxies that form within that halo, and 
(ii) the spatial distribution of galaxies within their par- 
ent halo jBenson et al1l2000l; iPeacock fc Smithll2000l : [Seliakl 
l200d : iBerlind fc Weinberg||2002l ; iBerlind et alj|2003h . 

The most recent renditions of this approach take ad- 
vantage of high resolution cosmological simulations to link 
the physical properties of galaxies to the dynamical prop- 
erties of dark matter substructures. As subhaloes fall into 
a larger structure, they are subject to stripping and tidal 
disruption that efficiently reduce their mass. Therefore, it 
is natural to assume that the mass/luminosity of galax- 
ies that reside within these substructures is correlated with 
the subhalo mass at the time of 'infall' (Minfaii), i.e. at the 
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time when the g alaxy is, for the last time, a central galaxy 
of its own halo (I Vale fc Ostrikerl [20061 ; IConrov et aTTl2006l; 
IWang etlil 1 l2006T )7 The most commonly used observations 
to constrain the connection between galaxy properties and 
dark matter haloes are the galaxy stellar mass /luminosity 
function, and the galax y correlation function (jYang et al.l 
l2003l : IZehavi et al.ll2005l ). The "abundance matching" meth- 
ods use only the galaxy stellar mass function (SMF) as a 
constraint, and derive the M star - A/i n f a n relation assum- 
ing a monotonic relationship between gala xy mass and halo 
mass jMoster et alj|2010l ; IGuq et al.ll2010l V 

For the M star - Afj n f a n relation, som e models assume no 
scatter in the relation ( Guo et al]|201(j ) , while other models 
account for a ran d om scatter around the median relation 
(|Wang et alj|2006l ; iMoster et al.ll2010l ). In fact, one would 
naturally expect that some scatter exists around the median 
relation, due to the scatter i n the formation and growth his- 
tories of dark ma tter haloes (|Zhao et al ] |2003l ;ril et al 1 120071 : 
IZhao et al.ll2009l ). stoch astic processes at play in galaxy for- 
matio n and evolution (|ValIe et al.l 120051 ; iKauffmann et ail 
20061), and environmental physical processes (|Gotd l2003t 



Tanvuia et alll2003l ; Ijaffe et al.ll201ll ). Therefore, one could 



expect the scatter to be related to the physical properties 
of the parent dark matter haloes. When modelling the scat- 
ter in the M star - Mj n f a n relation, however, all authors have 
so far assumed a Gaussian distribution in lo garithmic stellar 
mass (|Wang et al.ll2006l ; lMoster et al.|[201oh : for a given halo 
mass, galaxy stellar masses are equally and randomly as- 
signed within the scatter, independently of other halo prop- 
erties. 

While the HOD approach assumes that the galaxy con- 
tent of a halo depends only on its mass, recent studies have 
demonstrated that the clustering of dark matter haloes de- 
pends on their formation time (usually defined as the time 
when half of the final mass of the halo is first assembled 
in a si ngle object ) . This 'assembly bias' was first pointed 
out by iGao et all (|2005l ) who used a large high-resolution 
simulation of the concordance A cold dark matter cos- 
mogony to demonstrate that haloes less massive than about 
10 13 Mq that assembled at high redshift are significantly 
more clustered than those of the sam e mass that assem bled 
more recently . Subs equent studies by IZhu et al.l |2006l ) and 
ICroton et all (|2007i ) studied the dependence of galaxy prop- 
erties on halo formation time using different galaxy forma- 
tion models. In particular, they found a dependence on halo 
formation time of galaxy clustering, galaxy occupation num- 
ber, and luminosity and stellar mass of central galaxies. In 
addition, the stellar mass of satellite galaxie s also appears 
to dep end on the FOF group mass at z=0 ()Neistein et al.l 
l2011bl ). 

An alternative method to study galaxy formation and 
evolution is provide d by semi-analytic models (SAMs) 
|White fc FrenkHl99ll ). Unlike HODs that provide an empir- 
ical/statistical relation between galaxy properties and host 
halo mass, SAMs attempt to describe the physical processes 
at play using observationally and/or theoretically motivated 
prescriptions coupled to dark matter merger trees that can 
be constructed analytically or extracted from large cos- 
mological N-body simulations. Given our poor understand- 
ing of the physical processes involved, and the existence of 
a complex interrelation between them, none of the SAMs 
that have been published matches all the statistical prop- 



erties observed |Neistein fc Weinmarml l20ld ; IWang et al.l 
2012). In this work, in particular, we will take advantage of 
two different model s , with different problems. The SAM of 
|Pe Lucia fc BlaizotJ (|2007l . DLB07) over-predicts the abun- 
dance of galaxies with low-to-intermediate stellar masses 
but reproduces the two-point galaxy correlation functions 
in different stellar mass bins (CFs ) mea sured in the local 
Universe. The SAM of iGuo et all (|201ll . Guoll) matches 
the observed galaxy stellar mass function in the local Uni- 
verse, but over-predicts the CFs for galaxies less massive 
than 10 10 ' 77 A/ Q . The two SAMs of DLB07 and Guoll that 
we use in this work are both base d on the halo merger trees 
from the Millennium Simulation (|Springel et al.|[2005l ). 

We wi l l also use the empirical HOD model of 
IWang et al.l (|2006l . hereafter Wang06), which is also based 
on the Millennium Simulation. In this model, galaxy po- 
sitions and velocities are assigned by following the orbits 
and merger histories of substructures in the simulation, as is 
done in the SAMs. Following the empirical HOD approach, 
rather than using detailed physical recipes to calculate the 
evolution of galaxy properties, galaxy stellar mass is linked 
directly with the galaxy parent dark matter halo mass at 
the time of infall, assuming a double power-law function. 
The parameters describing the M star - M in f a n relation are 
constrained by fitting both the SMF and the CFs from SDSS 
measurements. Therefore, by construction, Wang06 can fit 
both the observed SMF and the measured CFs. 

In this paper, we start by studying the M star - M infa ii 
relation in the two SAMs of DLB07 and Guoll, and com- 
pare the relation with that of Wang06. We then construct 
two 'rescaled SAMs' based on DLB07 and Guoll, by simply 
rescaling the stellar masses in SAMs so that the median and 
the amount of scatter of the M star - Mi n f a n relation are the 
same as in Wang06, while retaining the relative deviations 
of the model galaxies from the median relation. In this way, 
the rescaled SAMs and Wang06 differ only on how galax- 
ies populate the scatter of the M atar - Mi n f a n relation. We 
demonstrate that this detail affects significantly the cluster- 
ing properties of galaxies. 

This paper is organized as follows: in Sec. 2, we briefly 
introduce the models analysed in this work. In Sec. 3.1, we 
compare the SMF and CFs from Wang06 with predictions 
from the two SAMs, and analyse the M star - Mi n f a n relations 
of these three models. In Sec. 3.2, we discuss the rescaled 
SAMs and their predictions. In Sec. 4, we analyse the de- 
pendence on the halo formation time of the scatter in the 
M s t ar - Mi n f a n relation. A discussion of our findings and our 
conclusions are given in Sec. 5. 



2 THE MODELS 

2.1 The Wang06 model 

As explain e d ab ove, the empirical HOD model of 
IWang et al.l ((200 6) matches, by construction, both the 
galaxy SMF and the CFs measured in the local Universe. 
This model assumes a double power law relation between 
the galaxy stellar mass (Af star ) and the halo mass at the 
time of infall (Afi n f a n): 
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There are five free parameters describing the relation. Be- 
sides Mo, a, P and k shown above, at any given value of 
Mi n f a ii, the scatter in log(Mstar) is described assuming a 
Gaussian distribution with standard deviation a. For this 
work, we have recomputed the best fit parameters match- 
ing the latest SDSS DR7 data for the SMF and CFs (the 
Wang06 model was based on DR2 data). In particular, 
we require our model to match 29 points in SMF, with 
log(Af s tar//i~ 2 M0) in the range [9. ,11.9], and 119 points in 
CFs measured for five stellar mass bintjj. To acc ount for the 
syste matic errors in the stellar mass estimates l|Li fc White! 
l2009h . relative errors are set to be no smaller than the rel- 
ative error value at log(Afst a r/fr _2 Mq) = 11.35. The best- 
fit parameters are: M = 3.43 x 10 11 h~ 1 M G , a = 0.34, 
P — 2.56, logfc = 10.23 and a — 0.169 for central galaxies; 
M = 5.23 x lO'^-'Mo, a = 0.298, j3 = 1.99, logfc = 10.30 
and a — 0.192 for satellite galaxies. 



2.2 The DLB07 and Guoll model 

For details of the two SAM s analysed in this paper, w e re- 
fer to the o rigina l papers of lDe Lucia fc Blaizotl (|2007f ) and 
iGuo et al] (|201ll ). The basic ingredients of the two models 
are quite similar. The Guoll model differs from the DLB07 
model in that it features a different treatment of satellite 
evolution and for a more efficient supernova feedback. As 
mentioned above, the DLB07 and Guoll models use the 
same halo merger trees as in Wang06. In particular, the 
dynamical properties of galaxies and galaxy positions are 
identical in DLB07 and Wang06. In Guoll, the treatment 
of satellite galaxies, in particular regarding dynamical frac- 
tion and disruption, is slightly different, which results in 
slight differences in the total number and positions of satel- 
lite galaxies (see below). 

All model results shown below are based on dark matter 
halo merger trees ext racted from the Millennium Simulation 
(|Springel et alj|2005l ). The resolution of the simulation cor- 
re sponds to a subha lo mass limit of ~ 10 ft Mg. As shown 
in IGuo et all l|201lf) . comparing SAM predictions based on 
the Millennium Simulation to those based on the higher 
resolu tion Millennium-II Simulation (jBovlan-Kolchin et al.l 
I2009T ). model results converge at stellar masses of about 
6 x 1O 9 M . 



3 THE RELATION BETWEEN GALAXY 
STELLAR MASS AND HALO MASS 

3.1 Original models 

As mentioned in Sec. 1, the Wang06 model reproduces both 
the SMF and CFs, while DLB07 and Guoll do not fit both 
observations, although all the models are built on almost 
exactly the same dark matter halo merger trees. Therefore 
the different predictions for the SMF and CFs in three mod- 
els considered must be due to a different relation between 
M B tar and Mj n f a n. As a first step, in Fig. [T] we show the 
SMF and CFs in the three original models, and compare 



29 points for bins of log(M s t a r/A^0) <11.27 with r p in the 
range [0.02, 9]h~ 1 Mpc, and 11 points for the [11.27, 11.77] bin 
with r p in the range [0.8, 9]h~ 1 Mpc 



them with the SPSS DR7 re sults (|Li et alj|2006l ; iLi fc White! 
l2009l ; lGuo et al.ll20ld , [2oTll ). 

Note that the DLB07 model was mainly constrained 
by the observed K-band luminosity function, and was not 
tuned to reproduce the measured CFs. Fig.[T]shows that the 
DLB07 model actually reproduces the measured CFs in all 
stellar mass bins, but over-predicts the low mass end of the 
SMF. The Guoll model, on the other hand, was tuned to re- 
produce the observed SMF, and therefore matches very well 
these observations, down to the lowest galaxy stellar masses 
measured. However, it over-predicts the CFs of galaxies less 
massive than ~ lO lo,5 /i~ 2 M . 

We show the relation between M Bta r and A/i n f a n in the 
left panel of Fig. O for central (solid lines) and satellite 
(dashed lines) galaxies. The two upper right panels show 
the ratio of the median stellar mass from the two SAMs and 
Wang06 model as a function of halo mass. Clearly, there 
are differences in the relation between M star and Afi n f a n in 
the three models: (i) At fixed halo mass, satellite galaxies 
are less massive than centrals in the Wang06 model, while 
satellites are equally massive as centrals in the DLB07 model 
and more massive than centrals at Mi n f a n < 10 12 ft -1 Mq in 
the Guoll model, (ii) At low halo masses, both centrals and 
satellites in the DLB07 model are significantly more massive 
than in the Wang06 model, which results in an excess of 
low-mass galaxies with respect to the observed SMF. In the 
Guoll model, at low halo masses, with a similar mass of 
centrals as in the Wang06 model, the low-mass end of the 
observed SMF is reproduced, (iii) At large halo masses, both 
SAMs predict more massive centrals and satellites than in 
the Wang06 model, and translates into an excess of massive 
galaxies with respect to the observed SMF. 

In the bottom right panel of Fig. [5] we show the ra- 
tio of the scatter in the Af star - Afi n f a n relation in the two 
SAMs considered to that in HOD model. In the Wang06 
HOD model, the scatter around the median Af star - Afi n f a n 
relation is assumed to be independent of halo mass. In the 
SAMs, both DLB07 and Guoll predict larger scatter than 
Wang06, by up to ~ 40 per cent. 

Different M star - Mi n f a n relations also result in differ- 
ent satellite fractions, as shown in Fig. [3] Both the DLB07 
and Guoll models have a higher satellite fraction than the 
HOD model, and the Guoll model has a higher fraction of 
satellites than DLB07 in the mass range log(Af star /Mo) ~ 
[9.5,10.8]. The differences in the satellite fractions can be 
again explained by the respective M star - Mi n f a n relations 
of central and satellite galaxies. In DLB07, satellites are 
more massive than in the HOD for any value of Mj n fall, 
which results in relatively more high-mass satellites. In the 
Guoll model, although centrals have similar masses as in 
the HOD, satellites are more massive than centrals at the 
low-mass end, resulting in a higher fraction of satellites. In 
Fig. [3j we also over-plot the measur e d sate llite fraction from 
the group catalogue of lYang et al. (I2008T) and resu lts from 
galaxy-galaxy lensing of iMandelbaum et al.l (|2006t ). Obser- 
vational uncertainties are still rather large, but in general 
the HOD fractions are closest to observational results while 
both SAMs predict larger satellite fractions than seen in ob- 
servations (see also lLu et al.ll2012l ). 

As noted earlier, the DLB07 and Guoll models have 
slightly different satellite galaxy numbers/positions due to 
a different treatment for satellite mergers and disruption. 
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Figure 1. SMF and CFs in five stellar mass bins in the three models studied: the empirical HOD model of Wang06 (green lines), 
the DLB07 semi-analytic model (red lines), and the Guoll semi-analytic model (blue lines) . Obse rvational results from SDSS DR7 are 
indicated by black symbols with error bars et al . 20061 iLi fc Wh ite 2009; Quo et al.ll201ol ,l 201lh . In each panel, the upper part shows 
the results and the lower part gives the ratios between the models and the SDSS observation. By construction, the Wang06 model can 
reproduce both SMF and CFs, while two SAMs can not. 



The HOD model presented here is based on the same dy- 
namical information used in the DLB07 model. We have 
tested that these differences do not affect model predictions 
significantly: the dotted line in Fig.|3]shows results obtained 
by repeating our fitting procedure using the dynamical in- 
formation extracted from the Guoll model. In this case, the 
satellite fraction measured in the HOD model is only about 
0.01 lower than in the HOD model based on the DLB07 
galaxies. This difference is much smaller than the measured 
differences between SAMs and the HOD, and between the 
two different SAMs. 

As we have shown above, predictions from the DLB07 
model are in quite good agreement with the observed CFs, 



for the entire mass range sampled by the SDSS data, de- 
spite a larger fraction of satellites than observed. For the 
Guoll model, the predicted CFs is higher than observa- 
tional data for low-mass galaxies. Guo et al. 2011 argued 
that the large correlation signal could be due to the cos- 
mology used in the Millennium Simulation which assumes a 
value of as (= 0-9) that is hig her than the latest WMAP- 
7 result (|Komatsu et al.ll201ll ). Other studies have, how- 
ever, shown that a lower value of as is not sufficient to 
bring mode l results in agreement with observa t ional mea- 
surem ents (|Wang et al] l200Sl : iKang et al] 120121 : iGuo et all 
120121) , 
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Figure 2. Left panel: The M s t a r — Onfall relations in the Wang06 HOD model, DLB07 and the Guoll semi-analytic models. The 
results of the DLB07 (Guoll) model have been shifted down (up) by 1 dex, and Mi n f a n of satellite galaxies has been shifted by 0.03 
dex, for visualization purposes. Error bars show the 68 percentile distribution limits. Right panel: The ratio between the stellar mass of 
DLB07 (red lines)/ Guoll (blue lines) and that of the Wang06 HOD model as a function of Mi n f a ii. The top and middle right panels 
give results of the median value for central and satellite galaxies. The bottom right panel indicates the ratio between the scatter a in 
the SAM and in the HOD model, with results for the DLB07 model shown in red and results for the Guoll model in blue. Solid lines 
are for central galaxies, and dashed lines are for satellite galaxies. The semi-analytical models consistently predict higher stellar masses 
for a given halo mass, and more scatter, than the HOD model. 



3.2 Rescaled SAMs 

In this section, we test if SAM predictions can be brought 
into agreement with observational data, for both the SMF 
and CFs, by simply rescaling the M sta r - Minfali relation 
in the SAMs to be the same as in HOD model. For each 
Afinfaii bin, we rescale the stellar masses of galaxies to have 
the same median stellar mass value as in the HOD, as well 
as the same scatter around the median value. The relative 
deviations from the median relation are not altered, i.e., 
in each halo mass bin, galaxies that are more massive than 
predicted by the median relation are still more massive in the 
rescaled catalogue. Satellite and central galaxies are rescaled 
separately. In other words, our working assumption is that 
the two SAMs populate the scatter in the M ata r - Minf a n 
relation correctly, but that the absolute value predicted for 
the galaxy stellar mass is offset with respect to the correct 
value by an amount that is equal to the offset with respect 
to the HOD median relation. 

Results of our exercise are shown in Fig. U Red lines 
show the results for the rescaled DLB07 model that appears 
to reproduce both the observed SMF and the CFs very well. 
Note that for CFs, the rescaled model is close to the orig- 
inal one, with only small differences for low-mass galaxies. 
When the M star - Minfali relation is rescaled, as expected, 
the satellite fraction predicted by the rescaled model is con- 
sistent with that of the HOD, as shown in Fig. [3] 

We also test two other simple models using the DLB07 
predictions: in one case, we remove randomly a fraction of 
galaxies in each stellar mass bin so as to reproduce the 
observed galaxy SMF. Results of this exercise are shown 
as orange lines in Fig. [4] Since the original model repro- 



duces quite well the observed CFs, removing randomly a 
subset of galaxies in each stellar mass bin does not alter 
this agreement. In the other case, we remove only satellite 
galaxies. Results for this case are shown in green and show 
that, while the SMF is adjusted to fit observation, the CFs 
at small scales are largely suppressed. These simple tests 
demonstrate that, at least for the DLB07 model, reducing 
the number of satellites is not the right solution to get an im- 
proved model that matches both the SMF and CFs. Satellite 
galaxies are not the only galaxy type to be over-abundant: 
the number of centrals at low-mass end also appears to be 
over-predicted in this model. Note that this over-abundance 
does not apply to galaxies in the mass range log(Af star /M0) 
=[10.27,10.77], where the original DLB07 model fits both 
the SMF and CFs well, and only a few satellites need to 
be removed. For more massive galaxies, whil e the high mass 
end o f the observed SMF is very uncertain (|Bernardi et alj 
l20ld) . the original DLB07 model can be considered already 
doing a good job in reproducing both the observed SMF and 
the measured CFs. 

In summary, Fig. [4] shows that there are two possible 
ways to bring the predicted SMF and CFs from the DLB07 
model in agreement with data: (1) rescale the M star - Minf a n 
relation, to assign a lower galaxy mass to low-mass haloes; 
(2) reduce the number of low-mass galaxies randomly (both 
centrals and satellites). 

The same rescaling does not work for the Guoll model, 
as shown by the blue lines in Fig. [4] With the same M star - 
Mi n f a ii relation, and hence similar satellite fraction as in the 
HOD (dashed blue line in Fig.|3J), the rescaled Guoll model 
still over-predicts the CFs at low masses. This suggests that 
the distributions of galaxies within the scatter around the 
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Figure 4. SMF (upper left panel) and CFs in five stellar mass bins in different models: The DLB07 model when removing randomly a 
fraction of both centrals and satellites (orange lines, see text for detail), DLB07 model with satellite galaxies partially removed (green 
lines, see text for detail), the rescaled DLB07 model (red lines), and the rescalcd Guoll model (blue lines). Note that for the two rescaled 
models, both the median M s t ar — Afinfall relation and the scatter around the median are rescaled. Black symbols with error bars are 
SDSS DR7 results. The lower part of each panel shows the ratio between model results and observations. Only when part of both central 
and satellite galaxies are removed, DLB07 model can reproduce both SMF and CFs. Rescaling works for DLB07, but not for Guolf . 



median Mstar - Mi n f a n relation affects significantly the pre- 
dicted CFs. 



4 THE SCATTER OF THE Mstar - Minfall 
RELATION: DEPENDENCE ON HALO 
FORMATION TIME 

In this section, we investigate the scatter in the M star - 
Mi n f a ii relation in detail, to understand the differences be- 
tween the models discussed in the previous section. As ex- 
plained above, in the Wang06 model galaxy stellar masses 



are assigned assuming a random scatter around the median 
relation. In the SAMs, the scatter around the median rela- 
tion is not 'assumed' but follows naturally from the scatter 
in the halo mass accretion history and the stochasticity of 
the physical processes that drive the formation and evolu- 
tion of galaxies within haloes of fixed mass. Using predic- 
tions from the two SAMs, we can therefore check if and how 
these processes influence the scatter in the M ata r - Minfall 
relation. 
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Figure 3. Satellite fractions as a function of stellar mass in differ- 
ent models. The solid black, red and blue lines show the results of 
the Wang06, DLB07, and Guoll models. The dotted black line 
is the Wang06 model combined with the galaxy information of 
the Guoll model. The dashed red and blue lines are results of 
the rescaled DLB07 and Guoll models discussed in Section 3.2. 
Results from the Yang et al. (2008) group catalogue are shown by 
black diamonds with er ror bars. Green stars with error bars are 
weak lensing results by iMa ndclbaum ct al. (200^). The Wang06 
model and two rescaled SAMs give satellite fractions that are 
closer to observational measurements. 



4.1 Clustering for low- and high stellar mass 
galaxies in a fixed halo mass bin 

As a basic check, we can simply split galaxies at a fixed 
halo mass into two sub-samples according to whether the 
stellar mass is above or below the median stellar mass in 
the bin. In the HOD, these two sub-sample have the same 
correlation function by construction (because the scatter 
is random). For the SAMs, we find that this is not the 
case for low-mass haloes. We show this in Fig. [5] where we 
plot the CFs of central and satellite galaxies in haloes with 
log(M infa ii//i~ 1 A/ ) = [11.3, 11.5]. Blue and red lines show 
the CFs of galaxies with stellar mass larger and smaller than 
the median stellar mass of all galaxies in the halo mass bin 
considered. Top panels are for the DLB07 model, while bot- 
tom panels correspond to the Guoll predictions. 

Fig. [5] shows that, in both models, galaxies that are 
more massive than the median cluster more strongly. The 
difference in the clustering signal is comparable in the two 
models when considering central galaxies only. For satellite 
galaxies, the effect is more prominent in the Guoll model 
than in the DLB07 model, and the differences visible in the 
right panels of Fig. [5] strongly influence the clustering sig- 
nal for all galaxies. We have checked that similar results 
are found in both SAMs comb ined with the higher resolu - 
tion Millennium-II Simulation (|Bovlan-Kolchin et al.ll2009l ). 
These results show that galaxy stellar masses are not ran- 



domly dis tributed within the sc atter for a given halo mass 
(see also iNeistein et alJlioilbl ). The details of the scatter 
matter and significantly affect the predicted CFs. We stress 
that the two models considered use the same dark matter 
merger trees as basic input, and mainly differ in their treat- 
ment of the supernovae feedback process. Therefore, our 
results demonstrate that the distribution of galaxy stellar 
masses with respect to the median relation can be affected 
significantly by a different modelling of baryonic physics. 



4.2 The influence of assembly bias 

What causes the different clustering amplitudes shown in 
Fig. [5]? In Section [1] we have discussed the existence of an 
assembly bias, which causes, at a fixed halo mass, a higher 
clustering amplitude for haloes that assembled at higher red- 
shift. It seems reasonable to assume that the results shown 
in Fig.[S]are related to assembly bias. We illustrate that this 
is indeed the case in Fig. [6j where we show the relation be- 
tween the median stellar mass and the halo formation time 
in three halo mass bins in the DLB07 and the Guoll models. 
The halo formation time is defined as the time when 50 per 
cent of the final halo mass is assembled in a single object. 

Results in blue show the halo mass bin 
log(Mi n f a n//i -1 A/0) = [11.3, 11.5], which is the same 
mass range used in Fig. [5] We can see two clear trends: 
(i) At fixed Mi n f a n, earlier forming haloes contain more 
massive galaxies, indicating that assembly bias can indeed 
explain the results in Fig. [5] and (ii) at fixed Mj n f a ii, satellite 
galaxi es on average form earlier (see also INeistein et al.l 
l2011al l. The result is clearly more pronounced in Guoll 
than in DLB07, indicating that the details in the baryonic 
physics have a substantial influence on the strength of 
assembly bias, that reflects into a dependence of galaxy 
stellar mass on halo formation time. 

We also check the same relation in two higher halo 
mass bins, log(Af infa ii//i" 1 M ) = [12.3, 12.5] and [13.3, 13.5], 
shown in red and green respectively. While the result that 
satellites form earlier persists, we do not anymore see a clear 
correlation between stellar mass and time of assembly. 

In summary, we conclude that at a fixed, low halo mass, 
galaxies with different stellar masses are clustered differ- 
ently: lower mass galaxies are clustered less than higher mass 
galaxies. This is because the "over-massive galaxies" reside 
in haloes that form early, while the "under-massive" galax- 
ies are in haloes that form late. Therefore, SAM galaxies do 
not populate the scatter of the M a tar - Afi n f a ii relation ran- 
domly. The clustering properties of galaxies are influenced 
by halo assembly bias, which is by construction not included 
in HOD models.. 



5 DISCUSSION AND CONCLUSIONS 

In this p aper, we compare results from the empirical HOD 
model of lWang et al.l (|2006l ) with predictions from the s emi- 
analytic models p resented in lDe Lucia fc Blaizotl (|2007t ) and 
I Guo et all (|20Tlf ). both based on the halo merger trees ex- 
tracted from the Millennium Simulation. By construction, 
the HOD model is able to reproduce simultaneously the 
galaxy SMF and the CFs, down to the stellar mass limit 
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Figure 5. CFs of galaxies in subsamples split by stellar mass in the halo mass bin of log(M infal i//i- 1 M Q )=[11.3,11.5]. The top and 
bottom panels are for DLB07 and Guoll models respectively. For each model, results of subsamples of all galaxies, central galaxies and 
satellite galaxies arc shown from left to right. In each panel, the black line shows the CF for the whole sample, and blue/red lines show 
the CFs for subsamples with stellar masses above/below the median. For both centrals and satellites, galaxies with stellar mass above 
the median cluster more than the ones below the median, and the effect is stronger in Guoll for satellites. 



of the SDSS. The semi-analytic models have problems in re- 
producing both these observations. In particular, the DLB07 
model reproduces quite well the dependence of the clustering 
amplitude on mass but over-predicts the number densities 
of low-to-intermediate mass galaxies. In contrast, the Guoll 
model reproduces the stellar galaxy mass function down to 
the lowest mass measured (it does so by construction), but 
over-predicts the clustering amplitude for low-mass galaxies. 
These different predictions can be explained by comparing 
the predicted M sta r - Afj n f a n relations with that obtained by 
the HOD approach. 

We demonstrate that scaling the results from the semi- 
analytic model so as to force them to reproduce the same 
M s tar _ Minfaii relation that is found in the HOD does not 
suffice to bring them in agreement with both observational 
measurements used to constrain the HOD. Instead, we show 
that the way model galaxies populate the scatter around 
the median relation matters. In the HOD model, as in most 
other models that are found in the literature, the scatter 
around the M a t ar - Mi n f a n relation is modelled as a random 
Gaussian distribution. In the semi-analytic models we use, 
stellar masses exhibit clear dependence on halo formation 
time, with stronger trends for low-mass galaxies. At given 
A/i n f a ii, galaxies with larger stellar mass reside in haloes that 
formed earlier and consequently have a higher clustering am- 
plitude than haloes with the same mass but later formation 



times (|Gao et al.ll2005t ). The influence of assembly bias on 
galaxies is stronger in the Guoll model than in the DLB07 
model, and results in an excess of the clustering signal for 
low-mass galaxies. 

Does assembly bias exist in the real Univers e? The is- 
sue is still matter of debate. I Tinker et al.l (|2008l ) conclude 
there is no evidence for assembly bias for low-mass galaxies 
from the fact that HOD models match the observed void 
statistics of red and blue galaxies. If the effect on galaxies is 
present at the levels found in the DLB07 model for central 
and especially satellite galaxies, it might be difficult to dis- 
tinguish it from just a random scatter using observational 
constraints as the SMF and CFs in different stellar mass 
bins. Measurements of correlation function for galaxies in 
fixed stellar mass bins but split by colour and /or specific 
star formation rate may help to answer this question. We 
address this issue in a companion paper. 

If assembly bias significantly affects galaxies in the real 
Universe, as it does in the SAMs, it might pose problems 
for models neglecting this effect, like HOD and abundance 
matching models, in particular regarding low-mass galaxies 
and satellites. For example, for a given M star - Mi n f a n rela- 
tion and a given scatter around that relation, assuming ran- 
dom scatter will produce lower CFs than assuming a scatter 
accounting for assembly bias. If the correlation function is 
then reproduced by coincidence one may draw wrong con- 
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Figure 6. The relation between stellar mass and the redshift of halo formation for centrals (squares) and satellites (stars) in the DLB07 
model and the Guoll model, for three halo mass bins of log(M infall /h- 1 M Q ) = [11.3, 11.5], [12.3, 12.5] and [13.3, 13.5]. For a given 
M; n f a n, galaxies are binned according to the formation time of their host haloes from left to right: The 16 % that formed latest, the 10 
% with formation time around the median, and the 16 % that formed earliest. For the lowest halo mass bin considered, there is a clear 
dependence of galaxy stellar mass on halo formation time, which is stronger in Guoll than in DLB07. 



elusions about the importance of other effects that should 
have made clustering less strong, such as tidal stripping and 
reduced merger times of galaxies. Finally, if significant, as- 
sembly bias is relevant for precision measurements of cos- 
mological parameters. Future HOD and abundance match- 
ing models would need to account for a non-random scatter 
including the assembly bias effect. 
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